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Summary. Composite indicators aggregate a set of variables using weights which are 
understood to reflect the variables' importance in the index. In this paper we propose 
to measure the importance of a given variable within existing composite indicators via 
Karl Pearson's 'correlation ratio'; we call this measure 'main effect'. Because socio- 
economic variables are heteroskedastic and correlated, relative nominal weights are 
hardly ever found to match relative main effects; we propose to summarize their dis- 
crepancy with a divergence measure. We discuss to what extent the mapping from 
. nominal weights to main effects can be inverted. This analysis is applied to six com- 

Q\ • posite indicators, including the Human Development Index and two popular league 

tables of university performance. It is found that in many cases the declared impor- 
tance of single indicators and their main effect are very different, and that the data 
correlation structure often prevents developers from obtaining the stated importance, 
even when modifying the nominal weights in the set of nonnegative numbers with unit 
O ■ sum. 



O 



1. Introduction 



In social sciences, composite indicators aggregate individual variables with the 
$H ■ aim to capture releva nt, possibly latent, dimensions of reality such as a coun- 
try's c ompetitivenes s (IWorld Economic Foruml ( 2010t )), the quality of its gover- 



nance ( Agrast et al. ( 2010h). the freedom of its press ( Reporters Sans Frontieresl 



(|201lh; iFreedom House! t01l\) ) or the efficiency of its universities or school system 
( Leckie and GoldsteinI (120091 )1. These measures have been termed 'pragmatic' (see 
iHandl (|2009l ). vv. 12-13), in that they answer a practical need to rate individ- 
ual units (such as countries, universities, hospitals or teachers) for some assigned 
purpose. 

Composite indicators (which are also referred to here as indices) have been in- 
creasingly adopted by many institutions, both for specific purposes (such as to 
determine eligibility for borrowing from international loan programs) and for pro- 
viding a mea surement basis fo r shaping broad policy debates, in particular in the 
public sector (|Bird et al. (2005)). As a result, public interest in composite indicators 



has enjoyed a fivefold increase over the period 2005 — 2010: a search of 'composite 
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indicators' on Google Scholar gave 992 matches on October 2005 and 5, 340 at the 
time of the first version of this paper (December 2010). 

Composite indicators are fraught with normative assumptions in variable selec- 
tion and weighting. Here 'normative' is understood to be 'related to and dependent 
upon a system of norms and values'. For example, the proponents of the Human 
Development Index (HDI) advocate replacing gross domestic product (GDP) per 
capita as a measure of the progress of societies with a combination of (i) GDP 
per capita (ii) education and (iii) life expectancy, see iRavallion (|2010t ). Both the 
selection of these three specific dimensions and the choice of building the index b y 
giving these dimensions equal importance are normative, see IStiglitz et al.l J2009) , 
p. 65 . Composite ind i cators are thus often the subject of controversy, see ISaltellil 
(l2007UHendrik et al.1 (l2008h . 

The statistical analysis of composite indicators is essential to p revent media and 



stakeholders taking them at face value (see the recommendations in lOrganisation for Economic Co-ope 
(2008)), possibly leading to questionable policy choices. For example, a policy maker 
might think of merging higher education institutions just because t he most popu- 



lar lea gue table of universities puts a prize on larger universities, see lSaisana et al 
(1201 lh . 

Most existing composite indicators are linear, i.e. weighted arit hmetic averages 
([Organisation for Economic Co-operation and Development! . 120081 ). Linear aggre- 
gation rules have been criticized because weaknesses in some dimensions are com- 
pensated by strengths in other dimensions; this characteristic is called 'compen- 
satory'. Non-compensatory and non-linear aggregate ranking rules ha ve been advo- 



cated b y the lite r ature o n multicriteria decis i on ma k ing, see for examp le Billau t et al. 
(|2010l ). iMundal (|2008l) . iMunda and Nardol (|2009l ). iBalinski and Larakil (j2010h . In 
this paper we concentrate on linear aggregation, because of its widespread use. 

In this paper we address the issue of measuring variable importance in existing 
composite indicators. As illustrated by a motivating example at the end of this sec- 
tion, nominal weights are not a measure of variable importance, although weights 
are assigned so as to reflect some stated target importance, and they are commu- 
nicated as such. In linear aggregation, the ratio of two nominal w eights gives the 



rate of substitutabili ty between the two indivi dual variables, see (jBoyssou et al 



2006 . Chapter 4), or Decancq and Lugo ( 2010t ). and hence can be used to reveal 



the target relative importance of individual indicators. This target importance can 
then be compared with ex-post measures of variables' importance, such as the one 
presented in this paper. 

We propose to measure the importance of a given variable via Karl Pearson's 
'correlation ratio', which is widely applied in global sensitivity analysis as a first- 
order sensitivity measure; we call this measure 'main effect'. Main effects represent 
the expected relative variance red uction obtained in the outpu t (the index) if a 
given input variable could be fixed (jSaltelli and Tarantolal (|2002h . see Section 3.1). 
They are based on the statistical modelling of the relation between the variable and 
the index. 

This statistical modelling can be parametric or non-parametric; we compare a 
linear and a non-parametric alternative based on local-linear kernel smoothing. We 
apply the main effects approach to six composite indicators, including the HDI and 
two popular league tables of university performance. We find that in some cases, a 
linear model can give a reasonable estimate of the main effects, but in other cases 
the non-parametric fit must be preferred. Further, we find that nominal weights 
hardly ever coincide with main effects. We propose to summarize this deviation in 
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a discrepancy statistic, which can be used by index developers and users alike to 
gauge the gap between the effective and the target importance of each variable. 

We also pose the question of whether the target importance stated by the de- 
velopers is actually attainable by appropriate choice of nominal weights; we call 
this the 'inverse problem'. We find that in most instances the correlation structure 
prevents developers from obtaining the stated importance by changing the nominal 
weights within the set of nonnegative numbers with sum equal to 1. These findings 
may offer a useful insight to users and critics of an index, and a stimulus to its 
developers to try alternative, possibly non-compensatory, aggregation strategies. 

Our proposed measure of importance is also in line with current practice in Sen- 
sitivity Analysis. Recently, some of the present authors proposed a glo bal sensitivity 
analysis app roach to test the robustness of a composite indicator, see lSaisana et al. 
((200.4 l201lh : ihis approach performs an error propagation analysis of all sources 
of uncertainty which can affect the construction of a composite indicator. This 
analysis might be called 'invasive' in that it demands all sources of uncertainty to 
be modeled explicitly, e.g. by assuming alternative methods to impute missing val- 
ues, different weights, different aggregation strategies; the method may also test the 
effect of including or excluding individual variables from the index. 

In contrast, the approach suggested in this paper is non-invasive, because it 
does not require explicit modeling of uncertainties. The proposed measure also 
requires minimal assumptions, in the sense that it exists whenever second moments 
exist. Moreover, it takes the data correlation structure into account. When this 
analysis is performed by the developers themselves, it adds to the understanding 
- and ultimately to the quality, of the index. When performed ex-post by a third 
party on an already developed index, this procedure may reveal un-noticed features 
of the composite indicator. 

The paper is organized as follows: the rest of Section Q] reports a motivating ex- 
ample and discusses related work. Section [5] describes linear composite indicators. 
Section [3] defines the main effects and discusses their estimation. It also defines 
a discrepancy statistic between main effects and nominal weights. Finally it dis- 
cusses the inversion of the map from nominal weights to main effects. Section U] 
presents detailed results for six indices: the 2009 Human Development Index (2009 
HDI) , the Academic Ranking of World Universities by Shanghais Jiao Tong Univer- 
sity (ARWU), the university ranking by the Times Higher Education Supplement 
(THES), the 2010 Human Development Index (2010 HDI), the Index of African 
Governance (IAG) and the Sustainable Society Index (SSI). Section [S] contains a 
discussion and conclusions. A solution to the inverse problem is reported in the 
Appendix. 

1.1. Motivating example 

In weighted arithmetic averages, nominal weights are communicated by develop- 
ers and perceived by users as a form of judgement of the relative importance of 
the different variables, including the case of equal weights where all variables are 
assumed to be equally important. When using 'budget allocation', a strategy to 
assign weights, experts are given a number of tokens, say 100, and asked to ap- 
portion them to the variables composing the index, assigning more tokens to more 
important variables. This is a vivid example of how weights are perceived and used 
as measures of importance. However, the relative importance of variables depends 
on the characteristics of their distribution (after normalization) as well as their cor- 
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relation structure, as we illustrate with the following example. This gives rise to 
a paradox, of weights being perceived by users as reflecting the importance of a 
variable, where this perception can be grossly off the mark. 

Consider a University Dean who is asked to evaluate the performance of faculty 
members, giving equal importance to indicators of publications x±, of teaching x 2 
and of office hours and administrative work X3. Hence she considers an equally- 
weighted index, y = \ {x\ + X2 + £3), and she employs R\ := corr 2 (y, Xi) in order to 
measure the association between the index y and each of the x variables ex post. 

We consider two different situations, which illustrate the influence of variances 
and of correlations of the x variables on the performance of faculty members. In 
both situations, we let the variables x\, X2, £3 be jointly normally distributed with 
mean zero. First assume that the variance of x\ is equal to 7 while X2 and x% have 
unit variances, and that the Xj variables are uncorrelated; the value 7 is chosen here 
in order to make the variance of y equal to 1. We then find 

i? 2 = - « 0.778, Rl = R\ = w 0.016, 
1 9 2 3 63 

which implies that the importance (as measured by Rf) of the variables X2 and X3 
relative to x\ is equal to 1/49 « 0.020. This shows how variances can greatly affect 
this measure of importance. We conclude that the Dean needs to do something 
about the indicators' variances before computing the index. 

Changing the weights from 1/3 to l/(c^/ou), where c := l/V&ti an d <?n 

is the variance of xi would compensate for unequal variances; this corresponds to 
standardizing indicators before aggregation. In current practice, composite indica- 
tors builders prefer to normalise indicators before aggregation, for instance dividing 
by the highest score. Going back to the Dean's example, the yearly number of 
administration hours can be divided by the total number of hours within a year, 
delivering x% as the fraction of administration hours. We remark that, in general, 
normalised scores present different variances. 

Consider next the situation where x\, X2, £3 are standardized, i.e. have all unit 
variances. Assume also that the correlations pij := corr(xi, Xj) are all equal to zero, 
except P23 = P32 > 0. Simple algebra shows that 

R 2 = _L P2 _ R 2 _ (1 + P23) 2 R\ _ 1 

1 3 + 2 P23 ' U2 Us 3 + 2 P23 ' Rl (I + P23) 2 ' 

i.e. that the importance of indicators x 2 and x 3 is the same; this is a general 
property of standardized indicators. Note that the importance of indicators x 2 and 
x 3 is greater that the one of x\, because P23 > 0. Taking for instance P23 = 0.7, 
one finds 



li = l* 0.227, Rl = Rl = ^ ra 0.657, § = ^ 

1 22 ' 2 3 44 q . R 2 28 g 



One may well imagine a faculty member looking at the relative importance of X\ 
with respect to x 2 , complaining that research has become dispensable, because - al- 
though the index's formula seems to suggest that all variables are equally important 
- in fact teaching is valued more than publications by a factor of 3. In this second 
situation, even if the Dean has standardized the variables measuring publications 
xi, teaching x 2 and administration £3, the last two have a higher influence on the 
faculty performance indicator y due to their correlation. 



Ratings and rankings 5 



This example describes different situations which generate the paradox. The 
occurrence of different variances is one such situation; this is a problem also in 
practice, because usually individual indicators are normalized to be between and 
1 or and 100, and hence they have different variances in general. Also when 
correcting for different variances using standardized indicators, however, the para- 
dox can be generated by correlations. This is of practical concern as well, because 
different individual indicators are usually correlated. 

The paradox illustrated by the preceding example equally applies when the 
index's architecture is made of pillars, each pillar aggregating a subset of variables. 
An hypothetical sustainability index could have environmental, economic, social 
and institutional pillars, and equal weights for these four pillars would flag the 
developers' belief that these dimensions share the same importance. Still one of the 
four pillars with a weighting in principle of 25% could contribute little or nothing 
to the index, e.g. because the variance of the pillar is comparatively small and/or 
the pillar is not correlated to the remaining three. A case study of this nature is 
discussed later in the present work. 



1.2. Related work 

The connection of the present paper with global sensitivity analysis has been dis- 
cussed above. A related approach to measure variable importance in linear aggre- 
ga tions is the one of ' effecti v e weights', introduced in the psychometric literature 
bv IStanlev and Wand (|l968l ). IWang and Stanley! (|l970l ). The effective weight of a 
variable Xi is defined as the covariance between WiXi and the composite indicator 
y = Y^i=i w i x i divided by its variance, i.e. e; := cov(y,WiXi)/V(y). The same 
app roach has been employed in recent literature in global sensitivity analysis, see 
e.g. iLi et al.l (|2010f ). 

Effective weights ti are, however, not necessarily positive, and hence they make 
an improper apportioning of the variance V (y): cannot be interpreted as a 'bit' 
of variance. On the contrary, the measure of importance Si proposed in this paper 
(i.e. Pearson's correlation ratio) is always positive and can be interpreted as the 
fractional reduction in the variance of the index that could be achieved (on average) 
if variable Xi c ould be fixed. Si also fits into an ANOVA variance decomposition 
framework, see ISaltellil (|2002h for a discussion. 

Moreover, effective weights assume that the dependence structure of the vari- 
ables Xi is fully captured by their covariance structure, as in linear regression. As 
we show in the following, the relation between the index and its components may 
well be nonlinear, and the measure of importance proposed in this paper extends to 
this case as well. The case-studies reported in Section 0] show that nonlinearity is 
often the rule rather than the exception. In the case of a linear relation between y 
and Xi, our measure Si reduces to i?f , the square of corr(y, Xi), used in the example 
above; hence in this case, the present approach leads to a simple transformation of 
the effective weights. 



For some indices, such as the Product Market Regulation Index (see lNicoletti et al 



(2000)), Principal Component Analysis (PCA) has been used to select aggregation 
weights. PCA chooses weights that maximize (minimize) the variance of the index, 
and hence weights do not reflect the normative aspects of the definition of the in- 
dex. Consequently, weights are difficult to interpret and to communicate, and as a 
result the use of PCA in this context is not widespread. The same Product Market 
Regulation Index moved from the use of PCA to a simpler and more transparent 
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technique for l inear aggrega tion after a statistical analysis of the implications of 
such a change (jNardo , (20091) . 



2. Weights and importance 

Consider the case of a composite indicator y calculated as a weighted arithmetic 
average of k variables Xi 

k 

»=1 

where Xji is the normalized score of individual j (e.g., country) based on the value 
Xji of variable X.i, i = 1, . . . , k and Wi is the nominal weight assigned t o variable 
X.j. T he most common approach is to normalize original variables, see Bandural 
(2008), by the min-max normalisation method 



Xj% Xyain^i /~\ 
x ji = y ' ' 

where -X" ma x,« and X m i nj i are the upper and lower values respectively for the variable 
X.i, in this case all scores Xji vary in [0, 1]. Here we indicate the transformation 
© as 'normalisation'; the normalised variables in ^ are denoted as x.i. We let 
/i; := Fi(xji) and an = Y(xji) indicate their expectation and variance respectively. 
In the following, we replace X.i and x.i by Xi and Xi respectively, unless needed for 
clarity. 

Observe that the normalisation (J2|) implies a fixed scale of the individual indi- 
cators; this is useful for instance for comparability in repeated waves of the same 
index. However, normalisation does not imply any standardization of different x.i 
variables, which hence have different means Hi and variances an in general. 

A popular alternative to the min-max normalization in ([2]) is given by standard- 
ization 



Xji — E (Xji) 



(3) 



where E (Xji) and V (Xji) are the mean and variances of the original variables X.i. 
When standardized, all Xi have the same mean and variance, fii — 0, an — 1 for all 
i, removing one source of heterogeneity among variables. However, standardization 
does not affect the correlation structure of the variables Xi (or Xi). Both transfor- 
mations flU) and are invariant to the choice of unit of measurement of Xi, see 
(|HandL l2009l Chapter 1). 

While standardization may appear a better approach than normalization, sta- 
tistically, there are advantages and disadvantages of both. For example standard- 
ization may be expected not to work so well when the distribution is very skewed 
or long tailed. Moreover it does not enhance comparison across different waves of 
the same aggregate indicator over the years, if the mean and variances used in ([3]) 
change over time. Also one cannot achieve both standardization and normalization 
at the same time through a linear transformation of Xi. This implies that index 
developers suffer the unwanted disadvantages of the chosen transformation. 

Whatever the transformation, in the following we denote the column vector of 
scores of unit j as Xj := (xji, . . . , Xjk)' and indicate by /i := (/ii, . . . , /i^)' and £ := 
(°~it)it=i the corresponding vector of means and the implied variance- covariance 
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matrix. The weight, Wi, attached to each variable, Xi, in the aggregate is meant 
to appreciate the importance of that variable with respect to the concept being 
measured. The vector of weights w := (w\, . . . ,Wf.)' is selected by developers on 
the basis of different strategies, be those statistical, such a s PCA ; or ba sed on expert 
evaluation, such as analytic hierarchy process, see Saatv (l980, 1983). 



In what follows we indicate by Q g the target relative importance of indicators i 
and £. When this is not explicitly stated, the ratios Wi/wg can be taken to be the 
'revealed target relative importance'. In fact Wi/we is a measure of the substitution 
effect between Xi and xg, i. e. how much xg must be increased to offset or balance 
a unit decrease in Xi, see iDecanca and Lueol ( 2010t) . For simplicity of notation 
and without loss of generality, we assume that the maximal weight is assigned to 
indicator 1, i.e. that w\ > Wi for i = 2, . . . , fc, and we consider Q := Qy. 

Note that the previous discussion applies to pillars as well as to individual vari- 
ables, where a pillar is defined as an aggregated subset of variables, identified by 
the developers as representing a salient - possibly latent, or normative - dimension 
of the composite indicator. 



3. Measuring importance 



3. 1 . Measures of importance 

In this paper we propose a variance-based measure of importance. We note that 



E(j/)=wV, V(y) = w'Sw, 



(4) 



where, if ([3]) is used, E (y) = and the diagonal elements of S are equal to 1; here 
we have dropped the subscript j in yj for conciseness. In the following, we focus 
attention on the variance term. 



Following iPearsod (|1905f ) , we consider the question 'what would be the average 
variance of y, if variable Xi were held fixed?' This question leads to consider 

(V x _ (y | Xi )), 

where x^i is defined as the vector containing all the variables in x except variable 
Xi. Owing to the well known identity 

V,, (E x _ (y | xi)) + E Xi (V x _ (y | Xi )) - V (y) 

we can define the ratio of (E Xr-ji (y \ Xi)) to V (y) as a measure of the relative 
reduction in variance of the composite indicator to be expected by fixing a variable, 

V,, (E x _ (y | Xl )) 



S, 



- rfi := 



V(y) 



(5) 



The notation Si reflects the use of this measure as a first order sensitivity measur e 
(also termed 'main effect') in sensitivity analysis, see Saltelli and Tarantolal (2002). 
The notation rjf reflects the original notation used in Pearsonl (jl905l) : hecalled it 
'correlation ratio rj 2 \ 

The conditional expectation E x ^ i (y \ x{) in the numerator of ([5]) can be any non- 
linear function of Xi ; in fact ( Xi ) := E x ^ (y | Xi ) = WiXi+^2t=i^i w ^~i ( x e I £j)> 
where the latter conditional expectations may be lin ear or nonlinear in X j. For the 
connection of fi (x^ to global sensitivity analysis see lSaltelli et al.l (|2008l ). 
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In the special case of /; (xi) linear in Xi, we find that Si reduces to Bjf , where 
Ri is the product- moment correlation coefficient of the regression of y on x$. In 
fact, it is well known that when fi is linear, i.e. fi {xi) = on + ftxj, it coincides 
with the L2 proje ction of y on Xj, which implies that ft = cov(y,Xi)/au, see e.g. 
Wooldridgel (l2010l) . Hence Si has the form Si — V Xi (ftxi + a*) /V(y) and one finds 



S t = ffau/V(y) = cov 2 (y,x l )/(a ll V( 2 /)) = i??. 

A further special case corresponds to /j linear and x made of uncorrelated 
components. We find cav(y, Xj) = 53t=i ^t 17 *! ancl V(y) = ^2t=i w t a tt so 5; = 
wjaii/ ^2t=i w t <J tt- The main difference between the uncorrelated and the corre- 
lated case is that in the former J2i=i^i = 1 because Si = wfau/ J2h=i w h a hh, 
while for the latter J2i=i &i might well exceed one, see e.g. ISaltelli and Tarantola 



(2002). We note that in general Si can still be high also when R\ is low, e.g. in 
case of a non-monotonic U-shaped relationship for fi(xi). Hence in general fi(xi) 
needs to be estimated in a nonparametric way, see Section I3~2l 

As these special cases illustrate, Si is quadratic measure in terms of the weights 
Wj for linear aggregation schemes ([TJ ; this follows from its definition as a variance- 
based measure. The main effect Si is an appealing measure of importance of a 
variable (be it indicator or pillar) for several reasons: 

• it offers a precise definition of importance of a variable, that is 'the expected 
fractional reduction in variance of the composite indicator that would be ob- 
tained if that variable could be fixed'; 

• it can be applied when relationships between the index and its components 
are linear or nonlinear. Such nonli nearity may b e the effect of nonlinear 
aggregation (e.g. Condorcet-like, see Munda ( 20081) ) and/or of nonlinear re- 



lationships among the single variables. It can be used regardless of the degree 
of correlation between variables. Unlike the Pearson or Spearman correlation 
coefficients, it is not constrained by assumptions of linearity or monotonicity; 

it is not invasive, that is no changes are made to the composite indicator or to 
the correlation struct ure of the ind i cators , unlike e.g. the error propagation 



analysis presented in ISaisana et al.1 (|2005l ) . While the error propagation can 



be considered as a stress test of the index, the present approach is a test of 
its internal coherence. 



3.2. Estimating main effects 

In this subsection we consider estimating the main effects and focus on the 2009 
HDI to illustrate our approach. In Section [4] we describe the six case-studies of our 
approach in detail. 

In sensitivity analysis, the estimation of & is an a c tive re s earch field. Si can be 



estimated from d esign points: ISoboll (1993); Saltelli (2002); Salte lli et al 



Fourier analysis: iTarantola et al.l ( 2006 ): Plischke ( 201dl ): DCu and Gertnei 



(2010); 



20111), 



or others. Many nonparamet ric estimators can b e used to estimate fj(xj), such as 
State Dependent Regression: iRatto et al.l (|2007l ): iRatto and Paeand (|2010l ). 

In the present work we employ a nonparametric, local-linear, kernel regression to 
estimate m(-) := /«(•), and then use it in ([5]) to estimate Si, replacing the variances 
in the numerator and denominator with the corresponding sample variances, i.e. 



using ^Li(' 



l ) 2 /YTj=i{yj-y) 2 i where y := n 1 Y^=iVj^ 



En 



3 • 



rrij := m(xji) and m(-) is the estimate of m(-) := fi(-). 
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,-3 Cross validation linearity test p-value 




0.2 0.4 0.6 0.8 1 '0.2 0.4 0.6 0.8 1 



Fig. 1 . 2009 HDI (y), Life expectancy (m). Upper left: Cross validation criterion as a function 
of the smoothing parameter h\ upper right: linearity test p-value as a function of h\ lower 
left: main effects S t as a function of h; lower right: cross plot of y versus xi with linear fit 
and local linear fits for h DPI (direct plug-in, dotted line) and h C v (cross validation, dashed 
line). The values of h DPI and h C v are plotted as vertical lines in the first 3 panels (dotted 
lines and dashed lines respectively). 
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Local linear kernel estimators achieve automatic boundary corrections and en- 
joy some typic al optimal properties , that are superior to Nadaraya- Watson kernel 



estimators, see iRuppert and Wand (1994) and reference therein. As a result, local 



linear kernel sm oother are often considered th e standard nonparametric regression 



method, see e.g. iBowman and Azzalinil (j 19971 ) 



The local linear nonparametric kernel regression is indexed by a bandwidth 
parameter h, which is usually held constant across the range of value for X{. For 
large h, the local linear nonparametric kernel regression converges to the linear 
least squares fit. This allows us to interpret 1/h as the deviation from linearity; it 
suggests that we investigate the sensitivity of the estimation of Si to variation in 
the bandwidth parameter h. In order to make this dependence explicit we write 
Si(h) to indicate the value of Si obtained by a local-linear kernel regression with 
bandwidth parameter h. In the application we use a Gaussian kernel. 

The choice of the smoothing parameter h can be based either on cross-validation 



(CV) principles (see IBowman and Azzalinil (J1997)) o r on plug-in choices f or the 



smoothing parameter, such as the ones proposed in Ru ppert et al.1 (|1995). We 
describe these approaches in turn, starting with cross validation. Let rh(x) indicate 
the local linear nonparametric kernel estimate for fi{x) at x% — x based on all n 
observations, and let rh^j(x) be the same applied to all data points except for the 
one with index j; then the least-squares Cross Validation criterion for variable Xi is 
defined as 



The optimal value for the CV criterion is given by the bandwidth hcv corre- 
sponding to the minimum of CV{h). In practice, a grid % of possible values for h 
is considered, and the minimum of the function CV{h) is found numerically. In the 
application we chose the grid of h values as follows: we defined a regular grid of 50 
values for u := \fh in the range from 0.1 to 5. The values for h were then obtained 
as h = a + u 2 /b, for index-specific constants a and 6; the resulting set of values in 
this grid is denoted H in what follows. 

The default values for indices with range from to 10 or 100 were a = 0.05, 
b = 1, so that .06 < h < 25.05; for indices with range from to 1, (namely 2009 
and 2010 HDI ), we chose a = 0.01, b = 25, so that .01 < h < 1.01. In some cases 
CY(h) attains its minimum at the right end of the grid %; This happened both for 
ARWU{1,2,3} and THES{4,6}, see Table Q] as well as for IAG{2,5} and SSI{2}, 
see Table [31 where the digits in braces refer to the subscript i of the Xi variables. 
In these cases, in practice, a linear regression fit would not be worse than the fit of 
the local linear kernel estimator, according to the CV criterion. 

In the implementation of the CV criterion, when a local linear kernel regres- 
sion implied a row of the smoothing matrix with numerical 'divisions by zero', we 
replaced it with a local mean (Nadaraya- Watson) estimator. When also the latter 
would imply numerical divisions by zero, we replaced the row of the smoothing 
matrix with a sample leave-one-out mean. 

An alternative choice of bandwidth is given by plug-in-rul es. One popula r choic e 



is given by the 'direct plug- in' selector hopi introduced bv lRuppert et al.l (|1995l ). 
which minimizes the Asymptotic Mean Integrated Squared Error for the local linear 
Gaussian kernel smoother, on the basis of the following preliminary estimators. Let 
9 rs := E(m^m^ r '), where 77i( r )(a;) is the r-th derivative of m(x). The range of 
Xi is partitioned into N blocks and a quartic is fitted on each block. Using this 
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Fig. 2. 2009 HDI {y), Adult literacy (x 2 ). See caption of Fig[Q 



estimation, an estimate for $24 is found, along with an estimator for the error 
variance a 2 :— E(yj — m(xji)) 2 . These estimates are then used to obtain a plug- 
in bandwidth g, which is used in a local cubic fit to estimate #22 and to obtain 
a different plug- in bandwidth A. The A bandwidth is then used in a final local 
linear kernel smoother to estimate a 2 , which is fed into the final formula for hopi, 
along with the previous estimate of 6*22- The choice of N, the number of blocks, 
is obtained minimizing Mallow's C p criterion over the set {1, 2, . . . , -/V max }, where 
iVmax = max{min(Ln/20j , N*), 1}. 

In the application we chose N* = 5 as suggested by iRuppert et al. ( 1995 ); in 
case of numerical instabilities, we decreased N* to 4. Moreover we performed an 
a-trimming in the estimation of 624 and 622 with a = 0.05. Because the choice of 
bandwidth can be affected by values at the end of the rr-range, we only considered 
pairs of observations for which x > in the choice of bandwidth, both for the CV 
criterion and the DPI criterion. 

The resulting choice of bandwidth h^pi was sometimes very close to hcvi as 
in the case for the 2009 HDI, which is depicted in Fig. [TJH where each figure refers 
to one of the four Xi indicators used in the construction of the 2009 HDL Fig. [1] 
refers to the X\ indicator (life expectancy), and contains four panels, which report 
- counterclockwise from upper-right - the p-value of the linearity test introduced 
below, the cross validation criterion CV, the Si measure and the regression cross 
plot. The first 3 graphs show functions of the bandwidth parameter h, while the 
final one has the values of Xi on the horizonal axis. Fig. [2]|4]have the same format, 
and refer to indicators X2, a; 3 and X4. 

Tables [1] and [3] report the selected values of hcv and Hdpi for the 2009 HDI 
and for the other 5 indices, described in detail in Section [4] It can be seen that the 




values of hcv sometimes differed from hopi by several orders of magnitude. 

As in many other contexts, in the estimation of main effects Si the linear case 
is a relevant reference model, and one would like to address inference on Si and 
on the possible linea rity of fj(xj) jointly. To this end we implemented the test for 
linearity proposed in Bowman and Azzalinil ( 1997 . Chapter 5). The fit of the linear 
kernel smoother can be represented as y = Sy, where the matrix S depends on 
all values Xji, j = 1, . . . ,n. A test of linearity can be based on the F statistic, 
F := (RSSo — RSSi)/RSSi, that compares the residual sum of squares under the 
linearity assumption RSSq with the one corresponding to the local linear kernel 
smoother RSS\. Letting F Q b s indicate the value of the statistic, the p- value of the 
test is computed as the probability that z'Cz > where z is a vector of independent 
standard Gaussian random variables and C := M(I — (1 + Fobs) A) M with A = 
(I - S)'(I - S), M = I - X(X'X) _1 X' and X equal to the linear regression design 
matrix, with first column equal to the constant vector and the second column equal 
to the values of x^, j = 1, . . . ,n. 



Bowman and Azzalinil ([19971 ) suggest approximating the quantiles of the quadratic 
form with the distribution of ax\ + c, where a, b and c are obtained by matching 
moments of the quadratic form and the ax\ + c distribution; here x\ represents a 
X 2 distribution with b degrees of freedom. We implemented this approximation; the 
upper right panels in Fig.[T]|4]report the resulting p- values of the test as a function of 
h for the 2009 HDL It can be seen for some Xj variable the test rejects the linearity 
hypothesis for all values of h in the grid H, and for some other pairs the test rejects 
only for a subset of T-L. In a few other pairs, the test never rejects for all h 6 T~L. 
Results for the linearity test are reported in Tables [1] and [3] for selected values of h, 
both the 2009 HDI and for 5 other indices, described in detail in Section [4] 
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f 3 Cross validation linearity test p-value 
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Fig. 4. 2009 HDI (y), GDP per capita (x 4 ). See caption of Fig[Q 

To show sensitivity of the main effects Si to the smoothing parameter h, we also 
computed the Si(h) index as a function of h. We also recorded the min and max 
values obtained for Si(h) varying h in H; we denote these values as Si >m i a , <Si, m ax- 
We report the plot of Si(h) as a function of h in the lower left panels of Fig. [T]H] 



3.3. Comparing weights and main effects 

In this section we compare revealed or target relative importance measures Q with 
the relative main effects Si/ Si. First notice that, in the independent case, Si = 

U! i (T ii/Hh=i' w h a hh> so * na * Si/Si — w i a U /wfan. When the Xi variables are 
standardized, all an = 1 and hence Si/ Si — wf /w\. The relative main effects Si/ Si 
do not reduce to C/f = Wi/wi, except in the homoskedastic case (an — an) when the 
nominal weights are equal, (w; = Wi), so that wfau/w\aii — wf/wf = 1 = Wi/wi. 
In the general case, Si depends on w and S in a more complicated way, and hence 
there is no reason, a priori, to expect Si /Si to coincide with C/f . 

One can compare how the effective relative importance Si/ Si deviates from 
the (revealed) target relative importance Q] to this end we define the maximal 
discrepancy statistic d m as 



max 

ie{2,...,fc} 



f2 &i 

^ Si 



(6) 



In the case of revealed target relative importance, recall that wi is assumed to 
be the highest nominal weight w max - In the case when more than one variable has 
maximum weight equal to u> max , we selected as reference variable the one with max- 
imum value for SiQi^dpi) with t 6 {1, . . . ,fc}, i.e. t — argmax ie r 1) ... t k\Si(hi t DPi) 
where hi dpi is the DPI bandwidth choice for indicator i. 
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The higher the value of d m , the more discrepancy there is between relative target 
importance and the corresponding relative main effects. In d m we have chosen to 
capture the discrepancy by focusing on the maximal deviation; alternatively one 
can consider any absolute power mean, /-divergence function or distance between 
the (un-normalized) distributions {C,f} and {Si/ Si}. For simplicity, in the following 
we indicate these distributions used in the comparison as {C/f} and {Si}. 

Because d m depends on the choice of bandwidth parameters h in the estimation 
of Si(h), i = 1, . . . , k we also calculated bounds on the variation of d m obtained by 
varying h. Specifically, we computed d m comparing {Cf} with {Si^} choosing £j 
as either equal to min or max, considering all possible combinations. For instance, 
with k = 2, we considered {Si <m i m i^.mm}, {Si.min, S 2 , max }, {5"i !max , 5 2 , m in} and 
{Si.maxj 5*2, max}- Within the distribution of values of d m obtained in this way, 
we recorded the minimum and the maximum, denoted as d m , m in ^m,max- Table 
[5] reports the d m for h equal to Hdpi, h-cv and in the linear case, along with the 
values d„ ijm i n , d mtmax , which provide a measure of sensitivity of d rn with respect to 
the choice of bandwidth h. 

To compare the Si values with the weights Wi graphically, in Fig [S] we re-scale 
the Si values to have sum equal to one, considering S* := Si/c with c := X>t=i &t, 
which we call 'normalized Si. In order to visualise bounds for S* , we plot bars 
with endpoints equal to S^min/c and S^max/c; these bars inform on the sensitivity 
of S* with respect to the variation of the bandwidth parameter h. 



3.4. Reverse-engineering the weights 

This section discusses when it is possible to find nominal weights Wi that imply 
pre-determined, given values zf for the relative main effects Si/ Si] here we indicate 
the target relative importance zf to differentiate it from Q of the previous sec- 
tions. This reverse-engineering exercise can help developers of composite indicators 
to anticipate criticism by enquiring if the stated relative importance of pillars or 
indicators is actually attainable. 

For the purpose of this inversion, we consider the case of fi{x{) linear in x^ 
in this case Si coincides with Rf, the square of Pearson's product moment corre- 
lation coefficients between y and Xi. The linear case can be seen as a first order 
approximation to the nonlinear general case; this choice is motivated by the fact 
that one can find an exact solution to the inversion problem of the map from Wi 
to R\/R\ when one allows weights Wi also to be negative. One expects that the 
reverse-engineering formula in the linear case to be indicative of the one based on a 
non-linear approach, where the latter would be computationally more demanding. 

We wish to find a value w* := (u>*, . . . , w^)' for the vector of nominal weights 
w := (u>i, . . . , Wk)' such that R\/R\ equals pre-selected target values zf, for i = 
1, . . . , k. We call this the 'inverse problem'. The weights w* are chosen to sum to 
1, but they are allowed also to be negative; this choice makes the inverse problem 
solvable, and in the Appendix we show that it has a unique solution, given by 

1 TZ^'S- (7) 



l'S-ig 



where g is a vector with i-th entry equal to gi := Zi*J ' on/ '<Jn and 1 is a k- vector of 
ones. 

Because the solution to this inverse problem is unique, if some of the weights w* 
in (UJ are negative, it means that a solution to the inverse problem with all positive 
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Table 1. Bandwidth choice at indicator level. Bandwidth h: h lyC v (cross 
validation), h itDP i (direct plug-in, DPI); p-values for the linearity test: 
Pi,cv (p-value for h^ov), p%,dpi (p-value for h ilD pi)', n is the number 
of observations with x 3l > used for CV and DPI; *: right end of the grid 
H. 





hcv 


Pcv 


flDPI 


PDPI 


n 


2008 ARWU 


Alumni winning Nobel 


25.05* 


0.88 


3.43 


0.71 


198 


Staff winning Nobel 


25.05* 


0.59 


3.13 


0.27 


135 


Highly cited res. 


25.05* 


0.00 


1.15 


0.00 


424 


Art. in Nature and Science 9.05 


0.00 


1.78 


0.00 


494 


Art. in Science and Social 


CI 2.94 


0.00 


2.26 


0.00 


503 


Academic perf. (size adj) 


1.74 


0.00 


2.12 


0.00 


503 


2008 THES 


Academic review 


4.46 


0.00 


1.74 


0.00 


400 


Recruiter review 


5.81 


0.00 


2.62 


0.00 


400 


Teacher/Student ratio 


4.46 


0.07 


4.76 


0.08 


399 


Citations per faculty 


25.05* 


0.04 


2.44 


0.20 


400 


International staff 


6.81 


0.04 


2.97 


0.22 


398 


International students 


25.05* 


0.18 


4.13 


0.65 


399 


2009 HDI 


Life expectancy 


0.09 


0.00 


0.08 


0.00 


142 


Adult literacy 


0.05 


0.00 


0.07 


0.00 


142 


Enrolment in education 


0.04 


0.00 


0.06 


0.00 


142 


GDP per capita 


0.09 


0.00 


0.08 


0.00 


142 



weights does not exist, and hence the targets zf are not attainable, owing to the data 
covariance structure. This can help designers to re-formulate their targets to make 
them attainable, and the stakeholders involved in the use of the composite indicator 
to evaluate wether the individual indicators can have the stated importance by an 
appropriate choice of weights. 



4. Case studies 

In this section we apply the statistical analysis that was described in Section [3] to 
the six composite indicators. In Section [4. II we consider the three indices for which 
aggregation was performed at indicator level and in Section 14.21 we consider the 
three indices for which aggregation was performed at the pillar level. 



4. 1 . Importance at the indicator level 

We consider the Human Development Index (HDI) and two well known composite 
indicators of university performance: the Academic Ranking of World Universities 
by Shanghai's Jiao Tong University (ARWU) and the one associated to the UK's 
Times Higher Education Supplement (THES). 



University R anking 

The ARWU, ICenter for World-Class Universities (2008), summarizes quality of ed- 
ucation, quality of faculty, research output and academic performance of world 
universities using six indicators: the number of alumni of an institution having 
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Table 2. Main effects at indicator level. Nominal weights main effects Si. S i>Un := 
Si(oo) (linear fit), Si,cv := SiQicv) (cross validation), Si, dpi := Si(h D pi) (direct 

plUg-in), gj.min := mill; ie ^ .Si, max : = maXfeg-H ffi(fo). 





Wi 




Si t cv 


Si, DPI 




5*2,111 ax 


2008 ARWU 


Alumni winning Nobel 


0.10 


0.64 


0.65 


0.67 


0.65 


0.76 


Staff winning Nobel 


0.20 


0.72 


0.72 


0.73 


0.72 


0.80 


Highly cited res. 


0.20 


0.81 


0.85 


0.87 


0.85 


0.90 


Art. in Nature and Science 


0.20 


0.87 


0.88 


0.88 


0.88 


0.94 


Art. in Science and Social CI 


0.20 


0.63 


0.70 


0.70 


0.64 


0.90 


Academic perf. (size adj) 


0.10 


0.71 


0.76 


0.75 


0.72 


0.88 


2008 THES 


Academic review 


0.40 


0.77 


0.81 


0.82 


0.78 


0.85 


Recruiter review 


0.10 


0.45 


0.54 


0.54 


0.46 


0.62 


Teacher/Student ratio 


0.20 


0.19 


0.21 


0.20 


0.18 


0.42 


Citations per faculty 


0.20 


0.38 


0.38 


0.41 


0.38 


0.50 


International staff 


0.05 


0.10 


0.12 


0.12 


0.10 


0.31 


International students 


0.05 


0.16 


0.16 


0.17 


0.16 


0.34 


2009 HDI 


Life expectancy 


0.33 


0.80 


0.80 


0.80 


0.78 


0.85 


Adult literacy 


0.22 


0.77 


0.78 


0.77 


0.76 


0.83 


Enrolment in education 


0.11 


0.77 


0.81 


0.78 


0.73 


0.86 


GDP per capita 


0.33 


0.85 


0.84 


0.84 


0.84 


0.88 



won Nobel Prizes or Fields Medals (weight of 10%), the number of Nobel or Fields 
laureates among the staff of an institution (weight of 20%), the number of highly 
cited researchers (weight of 20%), the number of articles published in Nature or Sci- 
ence, Science Citation Index Expanded and Social Sciences Citation Index (weight 
of 40%), and finally the academic performance measured as the weighted average 
of the above five indicators divided by the number of full-time equivalent academic 
staff (weight of 10%). The raw data are normalized by assigning to the best perform- 
ing institution a score of 100 and all other institutions receiving a score relative to 
the leader. The ARWU score is a weighted average of the six normalized indicators, 
which is finally re-scaled to a maximum of 100. The six indicators have moder- 
ate to strong correlations in the range from 0.48 to 0.87 and an average bivariate 

correlation of 0.68. 

The THES. lTimes Higher Education Supplement! (|2008l) . summarizes university 
features related to research quality, graduate employability, international orienta- 
tion and teaching quality using six indicators: the opinion of academics on which 
institutions they consider to be the best in the relevant field of expertise (weight 
of 40%), the number of papers published and citations received by research staff 
(weight of 20%), the opinion of employers about the universities from which they 
would prefer to recruit graduates (weight of 10%), the percentage of overseas staff 
at the university (weight of 5%), the percentage of overseas students (weight of 
5%), and finally the ratio between the full-time equivalent faculty and the number 
of students enrolled at the university (weight of 20%). Raw data are standardized. 
The standardized indicator scores are then scaled by dividing by the best score. 
The THES score is the weighted average of the six normalized indicators, which is 
finally re-scaled to a maximum of 100. The six indicators have very low to moderate 
correlations that range from 0.01 to 0.64 and a low average bivariate correlation of 
0.24. 
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Results for ARWU and THES are given in Tables HE and 02 The first two 
panels of Table Q] provide the bandwidth selection results for ARWU and THES; 
the corresponding panels of Table [5] give estimates of the importance measure Si 
for different choices of bandwidth. The first two lines in Table 03 give the maximum 
discrepancy statistic d m for ARWU and THES. Finally the two upper graphs in 
Fig. 03 summarize the comparison between target and actual relative importance of 
indicators. For ARWU the main effects Si are more similar to each other than the 
nominal weights, i.e. ranging between 0.14 and 0.19 (normalised Si values to unit 
sum, cross validation estimates) when weights should either be 0.10 or 0.20. 

The situation is worse for THES, where the combined importance of peer review 
based variables (recruiters and academia) appears larger than stipulated by devel- 
opers, indirectly suppo rting the hypo t hesis of linguistic bias at times addressed to 
this measure (see e.g. ISaisana et al.l ( 201lh for a review). Further for THES the 
'teachers to student ratio', a key variable aimed at capturing the teaching dimen- 
sion, is much less important than it should be when comparing normalized Si (0.09, 
cross validation estimate) with the nominal weight (0.20). 

Overall, there is more discrepancy between the nominal weights assigned to the 
six indicators and their respective main effects in THES (d m ,cv = 0.42) than in 
ARWU (d m ,cv = 0.36), cross validat ion estimates. Comparing this result with 



the conclusions in ISaisana et al.l ([201 II ). we can see the value-added of the present 
measure of importance. In that paper we could not draw a judgement about the 
relative quality of THES with respect to ARWU. The main effects used here allow 
us to say that - leaving aside the different normative frameworks about which no 
statistical inference can be made - ARWU is statistically more consistent with its 
declared targets than THES. 

When considering the sensitivity of d m values to the choice of bandwidths h, one 
can see that the range [rf m ,min, ^m.max] is slightly shorter for ARWU ([0.26,0.50]) 
than for THES ([0.29,0.55]); this implies that ARWU is slightly less sensitive than 
THES to the choice of bandwidths h. Note however that the two ranges overlap, 
so that there are choice of bandwidths h for which the ordering of d m values is 
reversed. This, however, does not happen at the values hcv an d hupi- 

The hypothesis of linearity is not rejected for two indicators for ARWU and for 
four indicators for THES, when evaluating the tests at hp,pi and hcv- The two 
indicators for ARWU are those with the highest proportion of values equal to 0, 
which were discarded in the choice of bandwidth; the number of valid cases n are 
198 and 135 respectively. This may reflect the fact that it is more difficult to reject 
linearity with smaller samples. The indicators used in THES instead do not have so 
many zero values; also here however, one finds that E(y|xi) is approximately linear 
for 4 indicators. 



The Human Development Index 2009 

The HDI, see lUnited Nations Development Programmei ( 20091) . summarizes human 
development in 182 countries based on four indicators: a long healthy life measured 
by life expectancy at birth (weight of 1/3), knowledge measured by adult literacy 
rate (weight of 2/9) and combined primary, secondary and tertiary gross enrollment 
ratio (weight of 1/9), and a decent standard of living measured by the GDP per 
capita (weight of 1/3). Raw data in the four indicators are normalized by using 
the min-max approach to be in [0, 1]. The 2009 HDI score is the weighted average 
of the four normalized indicators. Because data on Adult literacy rate was missing 
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Table 3. Bandwidth choice at pillar level. Bandwidth h: h^cv (cross 
validation), hi, PDI (DPI); p-values for the linearity test: Pi.cv (p-value for 
hi.cv), Pi.DPi (p-value for h i>D pi); n is the number of observations with 



Xji > used for CV and DPI; * 


: right end 


of the 


grid H. 








hcv 


pcv 


h-DPI 


PDPI 


n 


2010 HDI 


Life expectancy 


0.08 


0.00 


0.07 


0.00 


169 


Education 


0.02 


0.09 


0.06 


0.21 


169 


GDP per capita 


0.05 


0.00 


0.06 


0.00 


169 


IAG 


Safety and security 


17.69 


0.15 


3.31 


0.45 


53 


rule of law and corruption 


25.05* 


0.30 


4.75 


0.94 


53 


part, and human rights 


4.89 


0.08 


2.85 


0.41 


53 


Sust. economic opportunity 


22.14 


0.09 


4.21 


0.51 


53 


Human development 


25.05* 


0.17 


3.42 


0.87 


53 


SSI 


Personal development 


0.69 


0.00 


0.37 


0.00 


151 


Healthy environment 


25.05* 


0.41 


0.49 


0.69 


151 


Well-balanced society 


0.69 


0.00 


0.42 


0.01 


151 


Sustainable use of resources 


0.30 


0.00 


0.30 


0.00 


150 


Sustainable World 


0.86 


0.00 


0.38 


0.01 


151 



for several countries, we analyzed data only for the countries without missing data; 
this gave a total of 142 countries. The four indicators present strong correlations 
that range from 0.70 to 0.81 and an average bivariate correlation of 0.74. 

Nominal weights and estimates of the main effects are given in the last panel 
of Table [51 while the choice of bandwidth is given in Table [TJ The maximum 
discrepancy is given in Table [5] and a graphical comparison of nominal weights and 
estimates of the main effects is provided in Fig. [SJ Table [T] reports evidence on the 
choice of bandwidth h and the p- values for the linearity test, at the values Hdpi 
and hcv of the smoothing parameter h. 

Both the main effects Si and the Pearson correlation coefficients reveal a rela- 
tively balanced impact of the four indicators 'life expectancy', 'GDP per capita', 
'enrolment in education', and 'adult literacy' on the variance of the HDI scores, with 
the adult literacy being slightly less important. It would seem that HDI depends 
more equally from its four variables than the weights assigned by the developers 
would imply. For example, if one could fix adult literacy the variance of the HDI 
scores would on average be reduced by 77% (CV estimate), whereas by fixing the 
most influential indicator, GDP per capita, the variance reduction would be 84% 
on average. 

One might suspect that it was precisely the developers' intention, when assign- 
ing nominal weights 11% and 33% to these two variables respectively, to make them 
equally important on the basis of the Si measure; however this is is not stated explic- 
i tly in the index documentation report United Nations Development Programme! 
(2009). Overall, there is considerable discrepancy between the nominal weights as- 
signed to the four indicators and their respective main effects in 2009 HDI (d m ^cv — 
0.63). 

The analysis of the 2009 HDI illustrates vividly that assigning unequal weights to 
the indicators is not a sufficient condition to ensure unequal importance. Although 
the 2009 HDI developers assigned weights varying between 11% and 33%, all four 
indicators are roughly equally important. The scatterplots in Fig. [T][4] help visualize 
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Table 4. Main effects at pillar level. Nominal weights u,; main effects 5»: Si,u„ := 
5i(oo) (linear fit), Sj.cv := Si(hcv) (cross validation), Si t npi := Si(h DPI ) (direct 

plUg-in), Si, min := minfegH &,max : = maXfcg-H iSi(fe)- 





Wi 




Si,cv 


Si, DPI 


Si ,min 


5*2,111 ax 


2010 HDI 


Life expectancy 


0.33 


0.82 


0.84 


0.84 


0.81 


0.86 


Education 


0.33 


0.86 


0.87 


0.86 


0.84 


0.89 


GDP per capita 


0.33 


0.90 


0.90 


0.90 


0.89 


0.93 


IAG 


Safety and security 


0.20 


0.52 


0.54 


0.63 


0.51 


0.87 


rule of law and corruption 


0.20 


0.77 


0.76 


0.78 


0.76 


1.00 


part, and human rights 


0.20 


0.44 


0.63 


0.68 


0.43 


1.00 


Sust. economic opportunity 


0.20 


0.52 


0.52 


0.56 


0.52 


0.98 


Human development 


0.20 


0.50 


0.50 


0.55 


0.49 


0.94 


SSI 


Personal development 


0.13 


0.05 


0.14 


0.17 


0.04 


0.27 


Healthy environment 


0.13 


0.04 


0.04 


0.07 


0.04 


0.27 


Well-balanced society 


0.13 


0.13 


0.21 


0.21 


0.12 


0.32 


Sustainable use of resources 


0.30 


0.48 


0.64 


0.64 


0.47 


0.72 


Sustainable World 


0.30 


0.02 


0.06 


0.10 


0.02 


0.29 



the situation. In cases like this, where the variables are strongly and roughly equally 
correlated with the overall index, each of them ranks the countries roughly equally, 
and the weights are little more than cosmetic. 



4.2. Importance at the pillar level 

The issue of weighting is particularly fraught with normative implications in the 
case of pillars. As mentioned above, pillars in composite indicators are often given 
equal weights on the ground that each pillar represents an important - possibly 
normative - dimension which could not and should not be seen to have more or less 
weight than the stipulated fraction. The discrepancy measure presented here can 
be of particular relevance and interest to gauge the quality of a composite indicator 
with respect to this important assumption. Here we consider the 2010 version of 
the HDI, the Index of African Governance (IAG) and the Sustainable Society Index 
(SSI). 



The Human Development Index 2010 

In this section we analyze the 2010 version of the HDI at the pillar level, covering 
169 countries. From the methodological viewpoint the main novelty in this version 
of the index is the use of a geometric - as opposed to an arithmetic - mean, in 
the aggregation of the three pillars. The three pillars cover health (life expectancy 
at birth) X]n e , education x c du and income (gross national income per capita) a;i nc . 
Education is the combination of two variables, namely mean years of school i ng an d 
expected years of schooling, see United Nations Development Programme! (|2010t ). 



The 2010 HDI index y is computed as 

V (-^lifc ' ^cdu ' -^inc) ' 



where all three dimensions have equal weights. The reason for this change of ag- 
gregation scheme is to introduce an element of 'imperfect substitutability across all 
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Table 5. Maximum discrepancy statistic d m for different choice 
of the bandwidth h in the main effect estimator Si(h). DPI: di- 
rect plug-in, CV: cross validation, lin: linear fit, min and max 
values where obtained by considering all possible combina- 
tions of S i ,i i values, where h = min, max. 





d m , DPI 


d m ,cv 


dm, lin 


^m,min 


dm , max 


ARWU 


0.36 


0.36 


0.31 


0.26 


0.50 


THES 


0.41 


0.42 


0.34 


0.29 


0.55 


2009 HDI 


0.59 


0.63 


0.57 


0.50 


0.69 


2010 HDI 


0.06 


0.07 


0.09 


0.03 


0.13 


IAG 


0.29 


0.34 


0.42 


0.13 


0.57 


SSI 


0.85 


0.91 


0.95 


0.38 


0.98 



HDI dimensions', i.e. to reduce the compensatory na ture of the linear aggregation, 
see ([United Nations Development Programm eL l2010t p. 216). 

Nominal weights and estimates of the main effects are given in the first panel 
of Table 0] while the choice of bandwidth is given in Table [3J The maximum 
discrepancy is given in row 4 of Table [5] and a graphical comparison of nominal 
weights and estimates of the main effects is provided in Fig. [5l 

Overall, the HDI 2010 shows very little discrepancy between the goals of equal 
importance of the three pillars and the main effects. In fact all three pillars have 
similar impact on the index variance (roughly 84 — 90%). Hence, in this case the 
relative nominal weights are approximately equal to the relative impact of the pil- 
lars' on the index variance. Such a correspondence is of value because it indicates 
than no pillar impacts too much or too little the variance of the index as compared 
to its 'declared' equal importance. Compared to the other examples discussed, the 
2010 HDI is the most consistent in this respect {d m ,cv = 0.07). The linearity tests 
reveal that the role of education is approximately linear within the index, despite 
the multiplicative aggregation scheme. 

In order to assess the impact of the choice of the aggregation scheme on the 
index balance, we also perform a counterfactual analysis of the 2010 HDI using 
linear aggregation of the three dimensions. We find that this choice does not affect 
the relative importance of dimensions, as these have comparable variances and 
covariances. Hence the 2010 HDI would have been balanced also under a linear 
aggregation scheme. This, however, does not detract from the conceptual appeal of 
imperfect substitutability implicit in geometric aggregation. 



Index of African Governance 

Th e Index of African Governance was developed by the Har vard Kennedy Schoo l, 



Rotberg and Gisselquistl ([20081 ); for a validation study see lSaisana et al. (2009). 



see 

In the 2008 version of the index, 48 African countries are ranked according to five- 
pillars: (i) Safety and Security, (ii) Rule of Law, Transparency, and Corruption, 
(iii) Participation and Human Rights, (iv) Sustainable Economic Opportunity, and 
(v) Human Development. The five pillars are described by fourteen sub- pillars that 
are in turn composed of 57 indicators in total (in a mixture of qualitative and 
quantitative variables). Raw indicator data were normalized using the min-max 
method on a scale from to 100. The five pillar scores per country were calculated 
as the simple average of the normalized indicators. Finally, the IAG scores were 
calculated as the simple average of the five pillar scores. The five pillars have 
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Fig. 5. Comparison of normalized main effects S* (in light grey) and w l (in dark grey). St ■■= 
Si/c with c := J2t=i St ancl Si : = Si,cv (cross validation). The indicators x x are numbered 
consecutively as in TablesQHU Bounds for St are constructed as SVmm/c, 5i, max /c. 



correlations that range from 0.096 to 0.76 and average bivariate correlation of 0.45. 
Three pairwise correlations (involving Participation and Human Rights and either 
Sustainable Economic Opportunity or Human Development or Safety & Security) 
are not statistically significant at the 5% level. 

Nominal weights and main effects are given in Table HI and in Fig. [5J while the 
choice of bandwidth is reported in Tableland the discrepancy statistics in Table [S] 
The main conclusions are summarized as follows: The IAG is a good example of the 
situation discussed in Section Q] whereby all pillars represent important normative 
elements which by design should be equally important in the developers' intention. 
Overall the IAG appears to be balanced with respect to four pillars that have similar 
impact on the index variance (roughly 50 — 63%) , but the fifth pillar on the Rule of 
Law is more influential than conceptualised (Si — 76%, cross validation estimate). 
The IAG has a maximal discrepancy statistic d, m ,cv = 0.34. 

The linearity tests in Table [3] suggest that there is no statistical evidence against 
linearity for all the five indicators. Hence one could calculate Si here as J??. 



Sustainable Society Index 

The Sustainable Society Index (SSI) has been developed by the Sustainable Soci- 
ety Foundation for 151 countries and it is based on a definition of sustainability of 
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the Brundtland Commission ([van de Kerk and Manuel I2008T ) . Also in this exam- 
ple, the five pillars of the index represent normative dimensions which are, how- 
ever, considered of different importance: Personal Development (weight of 1/7), 
Healthy Environment (1/7), Well-balanced Society (1/7), Sustainable Use of Re- 
sources (2/7), and Sustainable World (2/7). These five pillars are described by 22 
indicators. Raw indicator data were normalized using the min-max method on a 
scale from to 10. The five pillars were calculated as the simple average of the 
normalized indicators. The SSI scores were calculated as the weighted average of 
the five pillar scores. 

One can note that the linearity test suggests that for the second pillar 'Healthy 
Environment' there is no evidence against linearity of its relation to the SSI index. 
The five pillars have correlations that range from —0.62 to 0.75, where negative 
correlations between pillars are generally undesired, as they suggest the presence 
of trade-offs between pillars (e.g. economic performance can only come with an 
environmental cost). Such trade-offs within index dimensions are a reminder of the 
danger of compensability between dimensions. 

For the Sustainable Society Index, there are notable differences between declared 
and variance-based importance for the five pillars. The different association between 
a pillar and the overall index can also be grasped visually in Fig. [5] The two 
pillars on 'Sustainable use of resources' and on 'Sustainable World' are meant to be 
equally important accordingly to the nominal weights (2/7 each), while the main 
effects suggest that the variance reduction obtained by fixing the former is 67% 
compared to merely 9% by fixing the latter. This strong discrepancy is due to the 
significant negative correlations present among the SSI pillars. Overall, the level of 
maximal discrepancy of the SSI is the highest of the examples discussed (d m ,cv = 
0.91). The authors and the developers of the SSI have been communicating on this 
issue, and the 2010 version of the SSI index appears considerably improved, see 
http: //www. ssf index. com/ssi/. 



4.3. Reverse-engineering the weights 

Applying the reverse engineering exercise described in Section ^. 4l and the Appendix 
to our test cases (except for the case of 2010 HDI that has low maximal discrep- 
ancy between relative weights and relative importance for the three pillars, and 
it is not obtained by the linear aggregation scheme ([1])), we find that to achieve 
a relative impact of the indicators (or pillars) (as measured by the square of the 
Pearson correlation coefficient i?f) that equals the relative 'declared' importance 
of the indicators, negative nominal weights are involved in all studies except for 
the SSI. In the case of SSI, to guarantee that the two pillars on Sustainable Use of 
Resources and Sustainable World are twice as important as the other three pillars, 
the nominal weights to be assigned to them are Personal Development (weight of 
0.19), Healthy Environment (0.16), Well-balanced Society (0.07), Sustainable Use 
of Resources (0.16), and Sustainable World (0.41). For all other cases, the data 
correlation structure does not allow the developers to achieve the stated relative 
importance by choosing positive weights. 



5. Conclusions 



According to man; 



Stialitz et al 



- including some of the authors of the Stiglitz report, see 
composite indicators have serious shortcomings. The debate 
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am ong those wh o prize their pragmatic nature in relation to pragmatic problems, 



see 



Handl (I2009f) . and t hose w ho consider them an aberration is unlikely to be set- 



tled soon, see Saltellil (|2007t ) for a review of pros and cons. Still these measures 



are pervasive in the public discourse and represent perhaps the best known face of 
statistics in the eyes of the general public and media. 

One might muse that what officia l statistics are to the consolidation of the mod- 
Hackingj ( 199C)( ). composite indicators are to the emergence of 



see 



ern nation state 

post-modernity, - meaning by this the philosophical critique of t he exact Scienc e 
and rational knowledge programme of Descartes and Galileo, see iToulminl (Il990h . 
p. 11-12. On a practical level, it is undeniable that composite indicat ors give voice 
to a p lurality of different actors and normative views. The authors in lStiglitz et al 
(2009) remark (p. 65): 



"The second [argument against composite indicators] is a general criticism that 
is frequently addressed at composite indicators, i.e. the arbitrary character of the 
procedures used to weight their various components. (...) The problem is not that 
these weighting procedures are hidden, non-transparent or non-replicable — they are 
often very explicitly presented by the authors of the indices, and this is one of the 
strengths of this literature. The problem is rather that their normative implications 
are seldom made explicit or justified. " 



The analysis of this paper shows that, although the weighting procedures are often 
very explicitly presented by the authors of the indices, the implications of these 
are neither fully understood, nor assessed in relation to the normative implications. 
This paper proposes a variance-based tool to measure the internal discrepancy of a 
composite indicator between target and effective importance. 

Our main conclusions can be summarized as follows. For transparency and sim- 
plicity, composite indicators are most often built using linear aggregation procedures 
which are fraught with the difficulties described in the Introduction: practitioners 
know that weights cannot be used as importance, while they are precisely elicited as 
if they were. Weights are instead measures of substitutability in linear aggregation. 
The error is particularly severe when a variable's weight substantially deviates from 
its relative strength in determining the ordering of the units (e.g. countries) being 
measured. 

Pearson's correlation ratio (or main effect) that is suggested in this paper is a 
suitable measure of importance of a variable (be it indicator or pillar) because: i) it 
offers a precise definition of importance (that is 'the expected reduction in variance 
of the composite indicator that would be obtained if a variable could be fixed'), ii) 
it can be used regardless of the degree of correlation between variables, Hi) it is 
model-free, in that it can be applied also in non-linear aggregations, and finally iv) 
it is not invasive, in that no changes are made to the composite indicator or to the 
correlation structure of the indicators. 

Because of property i) and the fact that it takes the whole covariance structure 
into account, the main effect can also be useful to prioritise variables on which a 
country or university, or whatever units are being rated, could intervene to improve 
its overall score. Note that the indicator with highest main effect is not necessarily 
the one in which the country scores the worst. 

The main effects approach can complement the techniques for robus tness analy- 
sis applied to composite indicators thus far seen in the literature, see e.g. Saisana et al.l 



(2005'): IOrganisation for Economic Co-operation and Development! (|2008luSaisana et al 
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(|201l[ ). The approach described in this paper does not need an explicit modeling 
of error-propagation but it is simply based on the data as produced by developers. 

The discrepancy statistic based on the absolute error between ratios of the main 
effects and of the corresponding target relative importance provides a pragmatic 
answer to the research question posed in this paper. Relative main effects are 
variance-based, and hence they are ratios of quadratic forms of nominal weights, 
while target relative importance are often deduced as ratios of nominal weights. 
Comparing them via the discrepancy statistic is a way to compare these two impor- 
tance measures, one of which is stated ex-ante as a target and the other one that is 
computed ex-post; this allows to see how close the two measures are in practice. 

The discrepancy statistic has been effective in the six examples discussed, in that 
it allowed an analytic judgement about the discrepancy in the assignment of the 
weights in two well known measures of higher education performance (d m — 0.42 
for THES versus d m — 0.36 for ARWU), two versions of a human development 
index (d m = 0.63 for the 2009 HDI and d m = 0.02 for the 2010 HDI), one index of 
governance (d m = 0.34 for the IAG) and one index of sustainability (d m = 0.86 for 
the SSI). 

Our reverse engineering analysis shows that in most cases it is not possible to 
find nominal weights that would give the desired importance to variables. This can 
be a useful piece of information to developers, and might induce a deeper reflection 
on the cost of the simplification achieved with linear aggregation. Developers could 
thus: 

a) avoid associating nominal weights with importance, but inform users of the 
relative importance of the variables or pillars, using statistics such as those 
presented in this paper; 

b) abstain from aggregating pillars when these display important trade offs which 
make it difficult to give them target weights in an aggregated index; 

c) reconsider the aggregation scheme, moving from the linear one (which is fully 
compensatory) to a partially- or fully-non-compensatory alternative, such as 
e.g. a Condorcet-like (or approximate Condorcet) appr oach, where w eights 



would fully play their role as measure of importance, see iMundal (|2008) ; 
d) assess different weighting strategies, so as to select the one that leads to a 
minimum discrepancy statistic between target weights and variables impor- 
tance. 
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Appendix - Solution to the inverse problem 

In the linear case, the ratio Si / Si equals the ratio of squares of Pearson's correlation 
coefficients Rf/Rf; this is a function Hi (w) of w := (wi, . . . , u>fc) and of the covari- 
ance matrix £ of x := (xi, . . . , Xk)' ■ One finds H (w) = (e-Sw) <Tn/((eJSw) 2 an), 
where is the i-th column of the identity matrix of order k and an is the i-th vari- 
ance on the diagonal of S. We wish to make Hi (w) equal to a pre-selected value 
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zf for all 



Hi (w) = 4 



i — 1, . . . , k, 



(8) 



and seeks to find a solution w € 
sum to 1, i.e. 



£. fc to this problem such that nominal weight to 
l'w = f. (9) 



We show that this solution is unique and it is given by (JTJ) in the text, where g is a 
vector with i-th. entry equal to gi := \/ (Jul <J\\ > and 1 is a fc- vectors of ones. 

Note that by construction g\ = 1. One has that (JHJ) can be written as e^Xw — 
j-e^Sw = 0, or, setting G := diag(l, l/g 2 , ■ ■ ■ l/g k ), F := le^ as (F - G) Ew = 0. 
This shows that Sw should be selected in the right null space of F — G. We observe 
that 

/ \ 



G = 



1 -1/52 



V i 









■1/flfc / 



whose right null-space A is one dimensional; moreover A is spanned by g := 
(1,<?2) ■ • • !<?&)'• Hence Sw = gc for a nonzero c or w = S _1 gc. Substituting this 
expression in ([9]), one finds 1 = l'w = l'S~ 1 gc, which implies c = l/l'S _1 g. One 
hence concludes that the weights that satisfy ([S) are given by ([7]), and that they 
are unique. 
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